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Abstract. Given a directed graph G with non negative cost on the arcs, a directed tree cover 
of G is a rooted directed tree such that either head or tail (or both of them) of every arc in G 
is touched by T. The minimum directed tree cover problem (DTCP) is to find a directed tree 
cover of minimum cost. The problem is known to be A''P-hard. In this paper, we show that 
the weighted Set Cover Problem (SCP) is a special case of DTCP. Hence, one can expect at 
best to approximate DTCP with the same ratio as for SCP. We show that this expectation 
can be satisfied in some way by designing a purely combinatorial approximation algorithm 
for the DTCP and proving that the approximation ratio of the algorithm is max{2, ln(_D+)} 
with D'^ is the maximum outgoing degree of the nodes in G. 

1 Introduction 

Let G = (y, A) be a directed graph with a (non negative) cost function c : A ^ defined on the 
arcs. Let c(u, v) denote the cost of the arc (m, v) G A. A directed tree cover is a weakly connected 
subgraph T = {U, F) such that 

1. for every e G A, F contains an arc / intersecting e, i.e. / and e have an end-node in common. 

2. r is a rooted branching. 

The minimum directed tree cover problem (DTCP) is to find a directed tree cover of minimum cost. 
Several related problems to DTCP have been investigated, in particular: 

— its undirected counterpart, the minimum tree cover problem (TCP) and 

— the tour cover problem in which T is a tour (not necessarily simple) instead of a tree. This 
problem has also two versions: undirected (ToCP) and directed (DToCP). 

We discuss first about TCP which has been intensively studied in recent years. The TCP is intro- 
duced in a paper by Arkin et al. [1] where they were motivated by a problem of locating tree-shaped 
facilities on a graph such that all the nodes are dominated by chosen facilities. They proved the 
A^P-hardness of TCP by observing that the unweighted case of TCP is equivalent to the connected 
vertex cover problem, which in fact is known to be as hard (to approximate) as the vertex cover 
problem [10]. Consequently, DTCP is also TVP-hard since the TCP can be easily transformed to 
an instance of DTCP by replacing every edge by the two arcs of opposite direction between the 
two end-nodes of the edge. In their paper, Arkin et al. presented a 2-approximation algorithm 
for the unweighted case of TCP, as well as 3.5-approximation algorithm for general costs. Later, 
Konemann et al. [11] and Fujito [8] independently designed a 3-approximation algorithm for TCP 
using a bidirected formulation. They solved a linear program (of exponential size) to find a vertex 
cover U and then they found a Steiner tree with U as the set of terminals. Recently, Fujito [3] and 



Nguyen [T3] propose separately two different approximation algorithms achieving 2 the currently 
best approximation ratio. Actually, the algorithm in [13] is expressed for the TCP when costs satisfy 
the triangle inequality but one can suppose this for the general case without loss generality. The al- 
gorithm in [9] is very interesting in term of complexity since it is a primal-dual based algorithm and 
thus purely combinatorial. In the prospective section of [11] and [9], the authors presented DTCP 
as a wide open problem for further research on the topic. In particular, Fujito [S] pointed out that 
his approach for TCP can be extended to give a 2-approximation algorithm for the unweighted case 
of DTCP but falls short once arbitrary costs are allowed. 

For ToCP, a 3-approximation algorithm has been developed in [TT] . The principle of this algorithm 
is similar as for TCP, i.e. it solved a linear program (of exponential size) to find a vertex cover U 
and then found a traveling salesman tour over the subgraph induced by U. Recently, Nguyen [T3] 
considered DToCP and extended the approach in jllj to obtain a 2 log2(n)-approximation algorithm 
for DToCP. We can similarly adapt the method in [11] for TCP to DTCP but we will have to find a 
directed Steiner tree with U a vertex cover as the terminal set. Using the best known approximation 
algorithm by Charikar et al. [4] for the minimum Steiner directed tree problem, we obtain a ratio 

2/3 

of (1 4- ^/\U\ W^^{\U\)) for DTCP which is worse than a logarithmic ratio. 
In this paper, we improve this ratio by giving a logarithmic ratio approximation algorithm for 
DTCP. In particular, we show that the weighted Set Cover Problem (SCP) is a special case of 
DTCP and the transformation is approximation preserving. Based on the known complexity results 
for SCP, we can only expect a logarithmic ratio for the approximation of DTCP. Let _D+ be the max- 
imum outgoing degree of the nodes in G, we design a primal-dual max{2, ln(£'+)}-approximation 
algorithm for DTCP which is thus somewhat best possible. 

The paper is organized as follows. In the remaining of this section, we will define the notations that 
will be used in the papers. In Section 2, we present an integer formulation and state a primal-dual 
algorithm for DTCP. Finally, we prove the validity of the algorithm and its approximation ratio. 
Let us introduce the notations that will be used in the paper. Let G ~ (V, A) be a digraph with 
vertex set V and arc set A. Let n = \V\ and m = \A\. If x G Q''^' is a vector indexed by the arc 
set A and F <Z E is a subset of arcs, we use x{F) to denote the sum of values of x on the arcs in 
F, x{F) ~ X^eeF-^e- Similarly, for a vector y e Q'^' indexed by the nodes and S* C 1/ is a subset 
of nodes, let y(S) denote the sum of values of y on the nodes in the set S. For a subset of nodes 
5 C F, let A{S) denote the set of the arcs having both end- nodes in S. Let f^"*" (5) (respectively 
(5~(5)) denote the set of the arcs having only the tail (respectively head) in S. We will call ^^iS) 
the outgoing cut associated to S*, &~ (S) the ingoing cut associated to S. For two subset U^W C V 
such that U DW = $, \ct (U : W) be the set of the arcs having the tail in U and the head in W. For 
w £ 1^, we say v an outneighbor (respectively inneighhor) of u if (u, v) & A (respectively (u, u) G A). 
For the sake of simplicity, in clear contexts, the singleton {u} will be denoted simply by u. 
For an arc subset F of A, let V{F) denote the set of end-nodes of all the arcs in F. We say F covers 
a vertex subset 5 if F n (5^ (5) 7^ 0. We say F is a cover for the graph G if for all arc (u, v) £ A, we 
have FnS-{{u,v}) ^ 0. 

When we work on more than one graph, we specify the graph in the index of the notation, e.g. 
Sq{S) will denote S^{S) in the graph G. By default, the notations without indication of the the 
graph in the index are applied on G. 



2 Minimum r-branching cover problem 



Suppose that T is a directed tree cover of G rooted in r £ i.e. T is a branching, V{T) is a 
vertex cover in G and there is a directed path in T from r to any other node in V{T). In this 
case, we call T, a r-hranching cover. Thus, DTCP can be divided into n subproblcms in which we 
find a minimum r-branching cover for all r G V. By this observation, in this paper, we will focus 
on approximating the minimum r-branching cover for a specific vertex r lE V. An approximation 
algorithm for DTCP is then simply resulted from applying n times the algorithm for the minimum 
r-branching cover for each r £ V. 

2.1 Weighted set cover problem as a special case 

Let us consider any instance A of the weighted Set Cover Problem (SCP) with a set i? = {ei, 62, ... , e^} 
of ground elements, and a collection of subsets Si, S2, ■ . ■ , Sq ^ E with corresponding non- negative 

weights wi, 1112, ■ . . , Wq. The objective is to find a set / C {1, 2, . . . ,q} that minimizes y Wi, such 



that \ \ Si = E. We transform this instance to an instance of the minimum r-branching cover 



problem in some graph Gi as follows. We create a node r, q nodes Si, S2, ■ ■ ■ , Sq and q arcs (r, Si) 
with weight Wi. We then add 2p new nodes ei, . . . , Cp and e'l, . . . , e'p. li Ck € Si for some 1 < fc < p 
and 1 < i < q, we create an arc (5*^, e^) with weight (or a very insignificant positive weight). At 
last, we add an arc (e^, ej.) of weight (or a very insignificant positive weight) for each 1 < k < p. 

Lemma 1. Any r-branching cover in Gi correspond to a set cover in A of the same weight and 
vice versa. 

Proof. Let us consider any r-branching cover T in Gi. Since T should cover all the arcs (efe, ej,) for 
1 < k < n, T contains the nodes Ck- By the construction of Gi, these nodes are connected to r 
uniquely through the nodes 5"!, . . . Sq with the corresponding cost wi, . . . , Wq. Clearly, the nodes 
Si in T constitute a set cover in A of the same weight as T. It is then easy to see that any set cover 
in A correspond to r-branching cover in Gi of the same weight. 

Let be the maximum outgoing degree of the nodes (except r) in Gi . We can see that D+ = p, 
the number of ground elements in A. Hence, we have 

Corollary 1. Any f{D^)- approximation algorithm for the minimum r-branching cover problem is 
also an f{p)- approximation algorithm for SCP where f is a function from N to M. 

Note that the converse is not true. As a corollary of this corollary, we have the same complexity 
results for the minimum ?'- branching cover problem as known results for SCP [12I7I15I2] . Precisely, 

Corollary 2. 

— // there exists a clii(D^) -approximation algorithm for the minimum r-branching cover problem 
where c < 1 then NP C DT/M£:(7i{0(i"''(^- ))>). 

— There exists some < c < 1 such that if there exists a c\og{D^)- approximation algorithm for 
the minimum r-branching cover problem, then P = NP. 





Note that this result does not contradict the Fujito's result about an approximation ratio 2 for the 
unweighted DTCP because in our transformation we use arcs of weight (or a very insignificant 
fractional positive weight) which are not involved in an instance of unweighted DTCP. 
Hence in some sense, the max{2, ln(L'^)} approximation algorithm that we are going to describe 
in the next sections seems to be best possible for the general weighted DTCP. 

3 Integer programming formulation for minimum r-branching cover 

We use a formulation inspired from the one in [11] designed originally for the TCP. The formulation 
is as follows: for a fixed root r, define T to be the set of all subsets 5 of F \ {r} such that S induces 
at least one arc of A, 

T={SCV\{r} I AiS)^n- 

Let T be the arc set of a directed tree cover of G containing r, T is thus a branching rooted at r. 
Now for every S € T, at least one node, saying v, in S should belong to V{T). By definition of 
directed tree cover there is a path from r to v in T and as r ^ 5, this path should contain at least 
one arc in 5^{S). This allows us to derive the following cut constraint which is valid for the DTCP: 

Xe>l for aWS eT 

This leads to the following IP formulation for the minimum r-branching cover. 

min c(e)xe 
eeA 

Xe>l for aWS eT 

eeS'iS) 

X e {0,1}^. 

A trivial case for which this formulation has no constraint is when G is a r-rootcd star but in this 
case the optimal solution is trivially the central node r with cost 0. 
Replacing the integrity constraints by 

a; > 0, 

we obtain the linear programming relaxation. We use the DTC(G') to denote the convex hull of all 
vectors x satisfying the constraints above (with integrity constraints replaced by a; > 0). We express 
below the dual of DTC{G): 

max Yj ys 

ys < c(e) for all e € A 

SeJ^ S.t. e£S-{S) 

ys>0 for alls eT 



4 Approximating the minimum r-branching cover 



4.1 Preliminary observations and algorithm overview 

Preliminary observations As we can sec, the minimum r-branching cover is closely related to 
the well-known minimum r-arborescence problem which finds a minimum 7'-branching spanning 
all the nodes in G. Edmonds [6] gave a linear programming formulation for this problem which 
consists of the cut constraints for all the subsets S \ {r} (not limited to S ^ F). He designed 
then a primal-dual algorithm (also described in [5]) which repeatedly keeps and updates a set Aq 
of zero reduced cost and the subgraph Go induced by and at each iteration, tries to cover a 
chosen strongly connected component in Go by augmenting (as much as possible with respect to 
the current reduced cost) the corresponding dual variable. The algorithm ends when all the nodes 
are reachable from r in Go- The crucial point in the Edmonds' algorithm is that when there still 
exist nodes not reachable from r in Go, there always exists in Go a strongly connected component 
to be covered because we can choose trivial strongly connected components which are singletons. 
We can not do such a thing for minimum r-branching cover because a node can be or not belonging 
to a r-branching cover. But we shall sec that if Go satisfies a certain conditions, we can use an 
Edmonds-style primal-dual algorithm to find a r-branching cover and to obtain a Go satisfying 
such conditions, we should pay a ratio of max{2, ln(n)}. Let us see what could be these conditions. 
A node j is said connected to to another node i (resp. a connected subgraph B) if there is a path from 
i (resp. a node in B) to j. Suppose that we have found a vertex cover U and a graph Go, we define 
an Edmonds connected subgraph as a non-trivial connected (not necessarily strongly) subgraph B 
not containing r of Go such that given any node i € B and for all v ^ B (1 U , v is connected to i 
in Go. Note that any strongly connected subgraph not containing r in Go which contains at least 
a node in U is an Edmonds connected subgraph. As in the definition, for an Edmonds connected 
subgraph B, we will also use abusively B to denote its vertex set. 

Theorem 1. If for any node v € U not reachable from r in Go, we have 

— either v belongs to an Edmonds connected subgraph of Gq, 

— or V is connected to an Edmonds connected subgraph of Go . 

then we can apply an Edmonds-style primal-dual algorithm completing Go to get a r-branching cover 
spanning U without paying any additional ratio. 

Proof. We will prove that if there still exist nodes in U not reachable from r in Go, then there 
always exists an Edmonds connected subgraph, say B, uncovered, i.e. Sq^{B) = 0. Choosing any 
node vi € U not reachable from r in Go, we can see that in both cases, the Edmonds connected 
subgraph, say Bi, of Go containing vi or to which vi is connected, is not reachable from r. In this 
sense we suppose that Bi is maximal. If Bi is uncovered, we have done. If Bi is covered then it 
should be covered by an arc from a node V2 € U not reachable from r because if 'i;2 ^ U then 
Bi U {V2} induces an Edmonds connected subgraph which contradicts the fact that Bi is maximal. 
Similarly, we should have V2 fi because otherwise Bi U {vi} induces an Edmonds connected 
subgraph. We continue this reasoning with V2, if this process does not stop, we will meet another 
node 113 E U \ {wi, W2} not reachable from r and so on . . . . As |[/| < n — 1, this process should end 
with an Edmonds connected subgraph Bk uncovered. 

We can then apply a primal-dual Edmonds-style algorithm (with respect to the reduced cost modi- 
fied by the determination of U and Go before) which repeatedly cover in each iteration an uncovered 
Edmonds connected subgraph in Go until every node in U is reachable from r. By definition of Ed- 
monds connected subgraphs, in the output r-branching cover, we can choose only one arc entering 



the chosen Edmonds connected subgraph and it is enough to cover the nodes belonging to U in this 
subgraph. 

Algorithm overview Based on the above observations on DTC{G) and its dual, we design an 
algorithm which is a composition of 3 phases. Phases I and II determine Go and a vertex cover U 
satisfying the conditions stated in Theorem [T] The details of each phase is as follows: 

— Phase I is of a primal-dual style which tries to cover the sets S* £ 7^ such that |S'| = 2. We 
keep a set Aq of zero reduced cost and the subgraph Go induced by Aq. Aq is a cover but does 
not necessarily contain a r-branching cover. We determine after this phase a vertex cover (i.e. 
a node cover) set U of G. Phase I outputs a partial solution which is a directed tree rooted 
in r spanning the nodes in U reachable from r in Go. It outputs also a dual feasible solution y. 

— Phase II is executed only if Aq does not contain a r-branching cover, i.e. there are nodes in U 
determined in Phase I which are not reachable from r in Go- Phase II works with the reduced 
costs issued from Phase I and tries to make the nodes in U not reachable from r in Gg, either 
reachable from r in Go, or belong or be connected to an Edmonds connected subgraph in Gq. 
Phase II transforms this problem to a kind of Set Cover Problem and solve it by a greedy 
algorithm. Phase II outputs a set of arcs Tq and grows the dual solution y issued from Phase I 
(by growing only the zero value components of y) . 

— Phase III is executed only if U Tq is not a r-branching cover. Phase III applies a primal-dual 
Edmonds-style algorithm (with respect to the reduced cost issued from Phases I and II) which 
repeatedly cover in each iteration an uncovered Edmonds connected subgraph in Go until every 
node in U is reachable from r. 

4.2 Initialization 

Set B to be the collection of the vertex set of all the arcs in A which do not have r as an end vertex. 
In other words, B contains all the sets of cardinality 2 in i.e. B ~ {S \S & F and 15*1 = 2}. Set 
the dual variable to zero, i.e. ?/ ^ and set the reduced cost c to c, i.e. c ^ c. Set Ao ^ {e S 
A I c(e) = 0}. Let Go = (Vb, Aq) be the subgraph of G induced by Aq. 

During the algorithm, we will keep and update constantly a subset of To C Ao. At this stage of 
initialization, we set To ^ 0. 

During Phase I, we also keep updating a dual feasible solution y that is initialized at (i.e. all 
the components of y are equal to 0). The dual solution y is not necessary in the construction of a 
r-branching cover but we will need it in the proof for the performance guarantee of the algorithm. 

4.3 Phase I 

In this phase, we will progressively expand Aq so that it covers all the sets in B. In the mean time, 
during the expansion of Aq, we add the vertex set of newly created strongly connected components 
of Go to B. 

Phase I repeatedly do the followings until B becomes empty. 

1. select a set S € B which is not covered by Aq. 

2. select the cheapest (reduced cost) arc(s) in S'^{S) and add it (them) to Aq. Aq covers then S. 
Let a denote the reduced cost of the cheapest arc(s) chosen above, then we modify the reduced 
cost of the arcs in 5~ (S) by subtracting a from them. Set ys <— a. 



3. Remove S from B and if we detect a strongly connected component K in Go due to the addition 
of new arcs in Aq, in the original graph G, we add the set V{K) to B. 

Proposition 1. After Phase I, Aq is a cover. 

Proof. As we can see, Phase I terminates when B becomes empty. That means the node sets of the 
arcs, which do not have r as an end-node, are all covered by ^o- Also all the strongly connected 
components in Gq are covered. □ 

At this stage, if for any node v there is a path from r to ?; in Go, we say that v is reachable from 
r. Set To to be a directed tree (rooted in r) in Gq spanning the nodes reachable from r. To is 
chosen such that for each strongly connected component K added to B in Phase I, there is ex- 
actly one arc in To entering K, i.e. \5~{K) n To| = 1. If the nodes reachable from r in Go form 
a vertex cover, then To is a r-branching cover and the algorithm stops. Otherwise, it goes to Phase II. 



4.4 Phase II 

Let us consider the nodes which are not reachable from r in Go. We divide them into three following 
categories: 

— The nodes i such that \5q — 0, i.e. there is no arc in Aq entering i. Let us call these nodes 
source nodes. 

— The nodes i such that |Jq = 1, i.e. there is exactly one arc in Aq entering i. Let us call 
these nodes sink nodes. 

— The nodes i such that \Sq > 2, i.e. there is at least two arcs in Aq entering i. Let us call 
these nodes critical nodes. 

Proposition 2. The set of the source nodes is a stable set. 

Proof. Suppose that the converse is true, then there is an arc {i,j) with i, j are both source nodes. 
As ~ ^Q^^ij) — 0, we have SQ^{{i,j}) = 0. Hence, {i,j) is not covered by Aq. Contradiction. 

Corollary 3. The set U containing the nodes reachable from r in Gq after Phase I, the sink nodes 
and the critical nodes is a vertex cover (i.e. a node cover) of G. 

Proposition 3. For any sink node j , there is at least one critical node i such that j is connected 
to i in Gq. 

Proof. Let the unique arc in Sq^^IJ) be (ii, j). Since this arc should be covered by Aq, S'^^^{ii) ^ 0. 
If > 2 then ii is a critical node and we have done. Otherwise, i.e. |<5Qjj(ii)| = 1 and ii is 

a sink node. Let («2,ii) be the unique arc in we repeat then the same reasoning for {i2,ii) 

and for 12. If this process does not end with a critical node, it should meet each time a new sink 
node not visited before (It is not possible that a directed cycle is created since then this directed 
cycle (strongly connected component) should be covered in Phase I and hence at least one of the 
nodes on the cycle has two arcs entering it, and is therefore critical). As the number of sink nodes 
is at most n ~ 1, the process can not continue infinitely and should end at a stage k [k < n) with 
ik is a critical node. By construction, the path i^, ik-i, . . . ,ii,j is a path in Gq from i^ to j. 



A critical node v is said to be covered if there is at least one arc {w, v) € Aq such that w is not a 
source node, i.e. w can be a sink node or a critical node or a node reachable from r. Otherwise, we 
say V is uncovered. 

Proposition 4. // all critical nodes are covered then for any critical node v, one of the followings 
is verified: 

— either v belongs to an Edmonds connected subgraph of Gq or v is connected to an Edmonds 
connected subgraph of Go, 

— there is a path from r to v in Gq, i.e. v is reachable from r in Gq. 

Proof. If V is covered by a node reachable from r, we have done. Otherwise, v is covered by sink 
node or by another critical node. From Proposition [3] we derive that in the both cases, v will be 
connected to a critical node w, i.e. there is a path from w to w in Gq. Continue this reasoning with 
w and so on, we should end with a node reachable from r or a critical node visited before. In the 
first case v is reachable from r. In the second case, v belongs to a directed cycle in Go if we have 
revisited v, otherwise v is connected to a directed cycle in Gq. The directed cycle in the both cases 
is an Edmonds connected subgraph (because it is strongly connected) and it can be included in a 
greater Edmonds connected subgraph. 

Lemma 2. // all critical nodes are covered then for any node v ^ U not reachable from r in Gq, 

— either v belongs to an Edmonds connected subgraph of Go, 

— or V is connected to an Edmonds connected subgraph of Gq. 

Proof. The lemma is a direct consequence of Propositions [3] and SI 

The aim of Phase II is to cover all the uncovered critical nodes. Let us see how to convert this 
problem into a weighted SCP and to solve the latter by adapting the well-known greedy algorithm 
for weighted SCP. 

A source node s is zero connecting a critical node v (reciprocally v is zero connected from s) if 
(s, v) £ ^0- If (s; ^ Aq but (s, v) Cz A then s is positively connecting v (reciprocally v is positively 
connected from s). 

Suppose that at the end of Phase I, there are k uncovered critical nodes vi, V2, ■ ■ - Vk and p source 
nodes si, S2 , . . . Sp. Let S = {si, S2, ■ . ■ , Sp} denote the set of the source nodes. 

Remark 1. An uncovered critical node v can be only covered: 

— by directly an arc from a sink node or another crtitical node to w, 

— or via a source node s connecting (zero or positively) w, i.e. by two arcs: an arc in 5^{s) and 
the arc (s, v). 

Remark [T] suggests us that we can consider every critical node w as a ground element to be covered 
in a Set Cover instance and the subsets containing v could be the singleton {w} and any subset 
containing v of the set of the critical nodes connecting (positive or zero) from s. The cost of the the 
singleton {v} is the minimum reduced cost of the arcs from a sink node or another crtitical node 
to V. The cost of a subset T containing v of the set of the critical nodes connecting from s is the 
minimum reduced cost of the arcs in 5~ (s) plus the sum of the reduced cost of the arcs (s, w) for 
all If G T. 

Precisely, in Phase II, we proceed to cover all the uncovered critical nodes by solving by the greedy 
algorithm the following instance of the Set Cover Problem: 



The ground set contains k elements which are the critical nodes vi, V2, . . . , Wfc. 
The subsets are 

Type I For each source node Si ior i ~ 1, ... ,p, let C{si) be the set of all the critical nodes 
connected (positively or zero) from s^. The subsets of Type I associated to Si are the subsets 
of C{si) {C{si) included). To define their cost, we define 



c(si) 



min{c(e) | e G (5 (si)} if S (si) ^ 
+CXD otherwise 



Let us choose an arc Cs- = argmin{c(e) \ e E S (s^)} which denotes an arc entering Si of 
minimum reduced cost. Let T be any subset of type I associated to Si, we define c{T) the 
cost of T as c(r) = c(si) + c(si, v). Let us call the arc subset containing the arc and 

the arcs {si,v) for all w G T uncovered, the covering arc subset ofT. 
Type II the singletons {wi}, {^2}, ■ . ■ , {vk}- We define the cost of the singleton {w j. 



c{vi) 



min{c(ti;, Vi) \ where w is not a source node, i.e. wGF\5} if(l/\S': {vi}) 7^ 
+00 otherwise 



Let us choose an arc Ct,. = argmin{c(zi;, w^) | where w is not a source node, i.e. w \ S}, 
denotes an arc entering Vi from a non source node of minimum reduced cost. Let the singleton 
{e^,;} be the covering arc subset of {v^}. 

We will show that we can adapt the greedy algorithm solving this set cover problem to our primal- 
dual scheme. In particular, we will specify how to update dual variables et the sets Aq and Tq in 
each iteration of the greedy algorithm. The sketch of the algorithm is explained in Algorithm 1. 
Note that in Phase II, contrary to Phase I, the reduced costs c are not to be modified and all the 



Algorithm 1: Greedy algorithm for Phase II 



1 while there exist uncovered critical nodes do 



2 
3 
4 

5 end 



Compute the most efficient subset A ; 
Update the dual variables and the sets Aq and Tq; 
Change the status of the uncovered critical nodes in A to covered 



computations are based on the reduced costs c issued from Phase I. In the sequel, we will specify 
how to compute the most efficient subset A and update the dual variables. 

For 1 < i < p let us call iS^ the collection of all the subsets of type I associated to Si . Let S be the 
collection of all the subsets of type I and II. 

Computing the most efficient subset. Given a source node Si, while the number of subsets 
in Si can be exponential, we will show in the following that computing the most efficient subset in 
Si is can be done in polynomial time. Let us suppose that there are iq critical nodes denoted by 
. . . ,vll which are connected (positively or zero) from s^. In addition, we suppose without 

loss of generality that c(si,w*^) < c{si,vl^,) < . . . < c{si,vl''.). We compute fi and Si which denote 
respectively the best efficiency and the most effecicient set in Si by the following algorithm. 



step 1 Suppose that w*'' is the first uncovered critical node met when we scan the critical nodes 
, f , . . . , v\\ in this order. 
Set S'iV {vll). Set c{Si) ^ c{si) + c{s,,vll). 
Set di ^ 1. Set fi ^ and Ai ^ S,. 

Step 2 We add progressively uncovered critical nodes vX for j = h + I, . . . ,iq to Si while this 
allows to increase the efficiency of Sf. 

For j ^ h + l to iq, if vl\ is uncovered and fi > '^(^'^+'^0^^'^ ' ) then fi -h- '^^''^''^^1^^^""''' , di ^ di + 1 
and 5, ^ 5, U{i.:0. 
Set imin <r- argmin{/i \si is a source node}. 

Choose the most efficient subset among S'i^ .„ and the singletons of type II for which the computation 
of efficiency is straightforward. Set A to be most efhcicnt subset and set d -(^ \A\ the number of the 
uncovered critical nodes in A. 

Updating the dual variables and the sets Aq and To 

Let g = max{|T| | T e 5} and let i7g 1 + i + i + ... + i. 

Remark 2. g < . 

Given a critical node v, let py denote the number of source nodes connecting v. Let s^, S2, . . . , s^^ 
be these source nodes such that c{s1,v) < c{s2,v) < . . . < c{sp^,v). We define Si = {v,sl, . . . ,s{,} 
for J — 1, . . . ,py. We can see that for j = 1, . . . 5*^ G J^. Let y^j be the dual variable associated 
to the cut constraints x{S~ (5^)) > 1. The dual variables will be updated as follows. For each critical 
node V uncovered in A, we update the value of y^j for j ~ 1, . . . ,py for that X]j=i Vs^ ~ u'^xd • 
This updating process saturates progressively the arcs (s:^, v) for j = 1, . . . ,py. Details are given in 
Algorithm [2] Wc add to Aq and to Tq the arcs in the covering arc subset of A. 



Algorithm 2: Updating the dual variables 



1 J ^ 1 ; 

2 while (j <pv) and (c{si'^^,v) < jf^^) 
ysi ^ c(sj+\u) - c{si,v); 

j ^j + t; 

5 end 

6 ifc{sP\v) < then 

I Vs?" -c(s?",«); 
8 end 



Let us define T as the set of the subsets T such that j/t is made positive in Phase II. 

Lemma 3. The dual variables which were made positive in Phase II respect the reduced cost issued 
from Phase I. 

Proof. For every T € T, the arcs in 5~{T) can only be either an arc in S~{si) with Si is a source 
node or an arc in S~{v) with w is a critical node. Hence, we should show that for every arc {u',u) 



with u is either a critical node or a source node, we have 

^ yT<c{u',u) 
TeT s.t. ueT 

— u is a critical node v and u' is the source node sj. The possible subsets T £ T such that 
(sj, w) e are the sets Si, . . . , 5*^"^. By Algorithmic we can see that 

i-i 

X]ys; < c(s}',w). 
fc=i 

— u is a critical node and u' \ S. By definition of c{v), we have c{u' , u) > c(v). By analogy 
with the Set Cover problem, the dual variables made positive in Phase II respect the cost of 
the singleton {u}. Hence 

^ 2/T < c{v) < c{u',u) 
TeT s.t. veT 

— u is source node and u' E V \ S. For each critical node w such that (m, w) G A, we suppose that 
u = Sw"'^^ where 1 < i{u,w) < pw Let 

Tu ^ {w \ w IS. a. critical node, (u, w) £ A and y^nu.u,) > 0} 

We can see that € S and c(T„) = c(w) + J2weT ^(""j '^O- Suppose that I is the total number 
of iterations in Phase II. We should show that 

where is the subset which has been chosen in A:*'* iteration. Let ak be the number of uncovered 
critical nodes in Tu at the beginning of the fc*'' iteration. We have then ai = \Tu\ and a;+i = 0. 
Let Ak be the set of previously uncovered critical nodes of T„ covered in the fc*'' iteration. We 
immediately find that \Ak\ = — Uk+i- By Algorithm [l] we can see that at the fc*'' iteration 
4^ < ■ Since \Ak\ = au - flfe+i then 

^iTTdT^^ E cK^))<-^x — E ^("'^)) 

Hence, 



E (]^;^-^(-'-))^^E^;— -E E ^("'-)) 



< 



'^'^"^E(- + ^ + --- + ^^)-E E 



^^^Et-E E ^(-^-)) 

9 i=l fc=ltoGT„nZifc 

<c(ri,)-^ E c{u,w))^c{u) 
k=i wsTunzife 



Let Tq C To the set of the arcs added to Tq hi Phase II. For each e G Tg , let C2(e) be the part of 
the cost c(e) used in Phase II. 

Theorem 2. 

C2(To) = ^ C2(e) <HgY.yT< HD+) Vt 

eeTg TeT TgT 

Proof. By Algorithm [2l at the fc*'' iteration, a subset Ak is chosen and we add the arcs in the 
covering arc subset of Ak to Tq for all v & Ak- Let Tq'' be covering arc subset of Ak- We can see 
that C2{Tq'') = V Ce = c(Z\fc). In this iteration, we update the dual variables in such a way 

that for each critical node v G Ak, Vs^ ~ H ^l^ with dk — \Ak\- Together with the fact that 

c{Ak) = c{wk) + ^ c{wk,v) we have J2v£Ak X]i=i Vsi = = summing over I be 

the number of iterations in Phase II, we obtain 

^c{Ak) -^cztTo^") C2(To) 

TeT k=iveAkj=i fe=i y k=l J f 

which proves that C2(To) = Hg'Y^rp^^yT- By Remark [5J we have g < and Hg « Ing, hence 
C2(To) <ln(i5,+ )ETeryT. □ 



4.5 Phase III 

We perform Phase III if after Phase II, there exist nodes in U not reachable from r in Gq. By Lemma 
[21 they belong or are connected to some Edmonds connected subgraphs of Go- By Theoremll] we can 
apply an Edmonds-style primal-dual algorithm which tries to cover uncovered Edmonds connected 
subgraphs of Go until all nodes in U reachable from r. The algorithm repeatedly choosing uncovered 
Edmonds connected subgraph and adding to Aq the cheapest (reduced cost) arc(s) entering it . As 
the reduced costs have not been modified during Phase II, we update first the reduced cost c with 
respect to the dual variables made positive in Phase II. 

For updating Aq, at each iteration, we add all the saturated arcs belonging to S^{B) to Aq. Among 



Algorithm 3: Algorithm for Phase III 

1 Update the reduced cost c with respect to the dual variables made positive in Phase II; 

2 repeat 

3 Choose B an uncovered Edmonds connected subgraph ; 

4 Let j/fl be the associated dual variable to B; 

5 Set c{B) ^ min{ce | e € S' (B)} ; Set ys <- c{B)- 

6 foreach e £ S~{B) do 

7 \ Ce -It- Ce — c{B); 

8 end 

9 Update Ao, Go and To (see below); 
10 until every nodes in U reachable from r; 



these arcs, we choose only one arc {u, v) with v E B to add to Tq with a preference for a u connected 
from r in Gq. In the other hand, we delete the arc {x, v) with x G B from Tq. We then add to Tq 
an directed tree rooted in v in Go spanning B. If there are sink nodes directly connected to B, i.e. 
the path from a critical node w G B to these nodes contains only sink nodes except w. We also add 
all such paths to To- 

Lemma 4. After Phase III, Tq is a r-hranching cover. 

Proof. We can see that after Phase III, for any critical node or a sink node w, there is a path 
containing only the arcs in Tq from r to v and there is exactly one arc in S~{v) D Tq. 

4.6 Performance guarantee 

We state now a theorem about the performance guarantee of the algorithm. 

Theorem 3. The cost of Tq is at most max{2, ln(£',t)} times the cost of an optimal r-branching 
cover. 

Proof. Suppose that T* is an optimal r-branching cover of G with respect to the cost c. First, we 
can see that the solution y built in the algorithm is feasible dual solution. Hence c'^y < c{T*). Let 
B be the set of all the subsets B in Phase I and Phase III {B is either a subset of cardinality 2 in 
or a subset such that the induced subgraph is a strongly connected component or an Edmonds 
connected subgraph in Go at some stage of the algorithm). Recall that we have defined T as the set 
of the subsets T such that yr is made positive in Phase II. We have then c^y = ys + yr- 

BeB TeT 

For any arc e in To, let us divide the cost c(e) into two parts: ci(e) the part saturated by the dual 
variables ys with B G B and C2(e) the part saturated by the dual variables yx with B €T. Hence 
c(To) = ci(To) + C2(To). By Theorem[2j we have 02(10) < In(-D^) '^TGrV'^ (note that the replacing 
in Phase HI of an arc {x, v) by another arc {u, v) with v E Bi do not change the cost C2(To)). Let 
us consider any set B E B by the algorithm. B is the one of the foUowings: 

— \B\ = 2. As Tq is a branching so that for all vertex v E V, we have \S^{v) n Tq\ < 1. Hence, 
\S-{B)nTo\ < 2. 

— B is a vertex set of a strongly connected component or an Edmonds connected subgraph in Go. 
We can see obviously that by the algorithm \6^{B) n Tq\ = 1. 

These observations lead to the conclusion that Ci(ro) !i "^^BeBVB- Hence 

c{Tq) = ci(ro) + C2(To) < 2 ^ + ln(i?+) yr 

BeB TeT 
< nmx{2,ln(D+)}c^y < max{2, ln(D,+ )}c(T*). 

Corollary 4. We can approximate the DTCP within a max{2, ln(£'+)} ratio. 

5 Final remarks 

The paper has shown that the weighted Set Cover Problem is a special case of the Directed Tree 
Cover Problem and the latter can be approximated with a ratio of max{2, ln(£'+)} (where Z?"*" is 



the maximum outgoing degree of the nodes in G) by a primal-dual algorithm. Based on known 
complexity results for weighted Set Cover, in one direction, this approximation seems to be best 
possible. 

In our opinion, an interesting question is whether the same techniques can be applied to design a 
combinatorial approximation algorithm for Directed Tour Cover. As we have seen in Introduction 
section, a 2 log2(n)-approximation algorithm for Directed Tour Cover has been given in |14] . but 
this algorithm is not combinatorial. 
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